The following coursebook is the main part for Online Data Science Series: Visualizing Geospatial Data in R workshop produced by the team at Algoritma . The coursebook is intended for a restricted audience only, i.e. the individuals having received this coursebook directly from the training organization. It may not be reproduced, distributed, translated or adapted in any form outside these individuals and organizations without permission.

Algoritma is a data science education center based in Jakarta. We organize workshops and training programs to help working professionals and students gain mastery in various data science sub-fields: data visualization, machine learning, data modeling, statistical inference, etc.

Before you go ahead and run the codes in this coursebook, it’s often a good idea to go through some initial setup. Under the Training Objectives section we’ll outline the syllabus, identify the key objectives and set up expectations for each module. Under the Libraries and Setup section you’ll see some code to initialize our workspace and the libraries we’ll be using for the projects. You may want to make sure that the libraries are installed beforehand by referring back to the packages listed here.

1 Preface

1.1 Introduction

Geospatial data is data about objects, events, or phenomena that have a location on the surface of the earth. The data combines location information (usually coordinates on the earth), attribute information (the characteristics of the object, event, or phenomena concerned), and often also temporal information (the time or life span at which the location and attributes exist)1.

Geospatial analysis uses this data to build maps, graphs, statistics, and cartograms, making complex relationships understandable. It introduces a formal techniques using geographic and geometry properties of a data. There are various business problems that can be solved using geospatial analysis, namely a few:

  • Warehouse logistics optimization
  • Market growth analysis for different regions
  • Sales demand forecasting
  • Work resource management

There are actually a plethora of tools that can visualize geographic information from full-scale GIS (Geographic Information System) applications such as ArcGIS and QGIS to web-based tools like Google maps to any number of programing languages. But, with advantages and disadvantages to these different types of tools, using a command-line interface has a steep learning curve, but it has the benefit of enabling approaches to analysis and visualization that are customizable, transparent, and reproducible2.

This 4-days online workshop is a beginner-friendly introduction to Geospatial Analysis in R. By visualizing geospatial data, users can have more intuitive decision making by contextualizing data in the real world, comparing informations across the city, state, or country.

1.2 Training Objectives

This is the first online data science series course of Geospatial Analysis in R. The primary objective of this course is to provide a participant a comprehensive introduction about tools and software for visualizing a geospatial data using the popular open-source tools: R. The material will covers:

Introductory Module:

  • Tools Introduction
    • R and R Studio
    • Open source packages
    • Using R Markdown
    • R Programming Basics
  • Data Wrangling with R’s tidyverse
    • Working with tabular data in R: Tips and Techniques
    • Data Wrangling and Aggregation
    • Introduction to visualization with ggplot2

Main Module:

  • Building Indonesia Static Map
    • Retrieving Indonesia spatial vector from an open source provider
    • Working with spatial polygon in R
    • Grammar of Graphics for geospatial data using ggplot2
    • Enhancing map plots for richer visualization
  • Creating Interactive Map
    • Using leaflet - a JavaScript API for creating interactive maps
    • Adding markers and colors in leaflet
    • Building various geospatial analysis graphics: Heatmap, Choropleth, and Connection Map
  • Publishing your visualization
    • Explore various publication options using rmarkdown versatile output
    • Create an awesome and easy-to-build dashboard using flexdashboard package
    • Present your geospatial analysis for various industries business solution

1.3 Library and Setup

In this Library and Setup section you’ll see some code to initialize our workspace, and the packages we’ll be using for this project.

Packages are collections of R functions, data, and compiled code in a well-defined format. The directory where packages are stored is called the library. R comes with a standard set of packages. Others are available for download and installation. Once installed, they have to be loaded into the session to be used.

You will need to use install.packages() to install any packages that are not yet downloaded onto your machine. To install packages, type the command below on your console then press ENTER.

## DO NOT RUN CHUNK
# packages <- c("tidyverse", "maps","rgdal","tmap","sf")
# 
# install.packages(packages)

Then you need to load the package into your workspace using the library() function. Special for this course, the rmarkdown packages do not need to be called using library().

# package for data wrangling & vis
library(tidyverse)
library(glue)
library(scales)

# package for spatial environment in R
library(sf)

# package for visualization
library(leaflet)
library(plotly)

2 Geospatial Analysis in R

Geospatial analysis, as the main topic of this workshop, is a domain of analysis that focuses on data processing that is associated with geographic data. With unlisted potentials for many business domain analysis, in this main section of the workshop, we will try to learn the building blocks of geospatial data and combines it with a business use case example in order to visualize our data as an insightful map.

We will explore how to use and obtain geospatial data in R to create some of the most popular types of thematic maps; choropleth (static and interactive) and interactive heatmap.

2.1 Spatial Data Types & Formats

Let’s have a quick glimpse on a simple geospatial visualization in R by running the code chunk below:

maps::map("world", fill = TRUE, col = "darkseagreen")

The code above projected countries around the world as a static map. But, how does geographic data information stored? How could we obtain detail information on geographic contours or territorial borders?

Illustration on Vector vs. Raster

Figure 2.1: Illustration on Vector vs. Raster

The two most widely used geographic data models for spatial data are vector and raster. A vector data represents location and shape of geographic features through geometric shapes such as points, lines and polygons. Raster data, on the other hand, consists of values within a grid system. You can imagine raster data as a pixelated digital image such as a satellite imagery or a scanned map.

There are lots of data formats that can be used to store each data model. KML, GeoJSON, GeoTIFF, Tab File, are some of the common formats you might ever came accross. However, the most common format in geographic information system mapping is the Shapefile.

Shapefile is the universal standard of geospatial format developed and regulated by Esri, the international supplier of geographic information system softwares. The type format is then adopted by so many programming languages, including R.

The name Shapefile might be a little deceptive since the file is commonly made up from these separate files:

  • .shp (mandatory): includes the geometry data.
  • .shx (mandatory): includes the index data used to identify different geometries.
  • .dbf (mandatory): includes attribute information of each geometry’s data.
  • .prj includes the information of the coordinate reference system

Other than the listed files above, a shapefile may includes other file components. A comprehensive list of shapefile components can be accessed here.

For this workshop, I have provided a shapefile for Indonesian territory under shp folder of this directory. Notice that inside the directory, there is also a gadm36_IDN_3.zip which actually the original Indonesian shapefile provided by GADM. If you go ahead to the website you can pick out different countries spatial vector. Indonesia’s spatial vector is provided up to 4 levels of granularity. In this course we’ll be working with 3 levels of spatial vector that contains: Provinsi, Kabupaten and Kecamatan.


Extra Tip:
You can also access GADM database directly from R with the help of GADMTools library:

# Code example to load Indonesia's spatial vector in Kabupaten/Kota level  
library(GADMTools)  
library(sf)  
map <- gadm_sf_loadCountries(c("IDN"), level=2, basefile = "./")  
plotmap(map)  

2.2 R’s Spatial Ecosystem

Let’s read in the files using st_read() function from sf package:

idn <- st_read(dsn = "shp", layer = "idn")
#> Reading layer `idn' from data source `/home/tanesya/Documents/Algo: 5. Others/DSS - Geospatial/shp' using driver `ESRI Shapefile'
#> Simple feature collection with 6694 features and 16 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: 95.0097 ymin: -11.00762 xmax: 141.0194 ymax: 6.076941
#> geographic CRS: WGS 84

sf represents “Simple Features” as records in a data.frame or tibble with a geometry list-column3. Simple Features is a hierarchical data model that represents a wide range of geometry types. Of 17 geometry types supported by the specification, only 7 are used in the vast majority of geographic research 4:

Source:[Geocomputation with R](https://geocompr.robinlovelace.net/)

Figure 2.2: Source:Geocomputation with R

If you use the class() function for our newly created idn object you can retrieved how the sf class handles spatial data in similar way as any tabular data structure; by stores it as a dataframe:

class(idn)
#> [1] "sf"         "data.frame"

The main difference between a regular dataframe an an sf object is that it has geometry which describes where on Earth the feature is located, and they have attributes, which describe other properties:

idn$geometry
#> Geometry set for 6694 features 
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: 95.0097 ymin: -11.00762 xmax: 141.0194 ymax: 6.076941
#> geographic CRS: WGS 84
#> First 5 geometries:

To get a better idea about geometry information in an sf object, let’s go ahead and plot idn$geometry using plot() function:

plot(idn$geometry)

There are several informations that is stored in an sf’s geometry:

  • Geometry Type: There are seven most common geometry types in simple feature: POINT, LINESTRING, POLYGON, MULTIPOINT, MULTILINESTRING, MULTIPOLYGON and GEOMETRYCOLLECTION.
  • Dimension: Refers to a 2-(XY), 3-(XYZ/XYM) or 4-dimensional (XYZM) space.
  • Bbox (Bounding Box): an area defined by two longitudes and two latitudes.
  • CRS: Coordinate Reference System.

A CRS is a fundamental component of geospatial data. It models earth surface into a mathematical model. Intuitively, you could think of it as a way to model a 3 dimensional surfaces such as earth, into a 2 dimensional surface that is commonly used in geospatial analysis: making maps, distance calculation, etc. Take a look at the following images for better illustration:

Source:[DataCarpentry](https://datacarpentry.org/)

Figure 2.3: Source:DataCarpentry

If we were using geospatial data from different sources, it is important to make sure the data we are using has the same CRS attribute. A different CRS would not represented in the same mathematical space if combined and would alter any calculation done on the data significantly.

2.3 Working with Vector Attributes

The big advantage of using sf is how you each functions can be combined using %>% operator and works well with the tidyverse collection of R packages, which gives us better control over the geometry information in sf objects. For example, to subset certain provinces, you can use filter() method just like how you work with regular dataframe:

bali <- idn %>% 
  filter(NAME_1 == "Bali")

plot(bali$geometry)

You can also perform aggregation operations over the geometry:

bali_prov <- bali %>% 
  group_by(NAME_1) %>% 
  summarise()

plot(bali_prov$geometry)

Dive Deeper

Using the same method as above, try to plot a map which shown Indonesian province of North Sumatra in the city (Kabupaten/Kota) level!

## Your code here 

Other than sf’s simple features, there are actually other methodologies for storing model of geographical features into R. If you ever ran into a geospatial studies in R, you may also familiar with the use of sp package. In fact, sp was a very-well developed package since 2005 which practically supported almost every GIS analysis in R, even until now.

The main problem of sp is its low compatibility to R dataframe structure. sf was built to fill the gap. Released in 2016, sf uses OGC (Open Geospatial Consortium) & ISO standard on recording and structuring spatial data with simple features.

The disadvantage of sf is, since it is relatively new, some spatial packages may have not yet added support for sf object. Fortunately, we can stil convert an sf object to spatial class used in sp:

library(sp)
idn_sp <- as(idn, Class = "Spatial")

class(idn_sp)
#> [1] "SpatialPolygonsDataFrame"
#> attr(,"package")
#> [1] "sp"

Spatial objects can be converted back to sf in the same way or with sf’s st_as_sf() function:

idn_sf <- st_as_sf(idn_sp)

class(idn_sf)
#> [1] "sf"         "data.frame"

Notes:

If you want to learn more about sp and how to work with spatial data using sp ecosystem, you can refer to “sp-example.Rmd” in this directory.


3 Hands On: Jakarta House Pricing Distribution

3.1 Data Preprocessing

Now let’s head back to our listings.csv, a data consists of properties listed for sale in Jabodetabek area:

df <- read.csv("data/listings.csv")
head(df)
#>   id          kota     kecamatan       harga KT KM  m2
#> 1  1 Jakarta Pusat        Gambir  3900000000  3  2 169
#> 2  2 Jakarta Pusat     Kemayoran  1450000000  3  2 160
#> 3  3 Jakarta Pusat        Gambir 11100000000 10  8 720
#> 4  4 Jakarta Pusat     Kemayoran   485000000  2  1  35
#> 5  5 Jakarta Utara Tanjung Priok  5600000000  6  5 240
#> 6  6 Jakarta Barat     Tamansari  3000000000 10  5 180

Here, we want to compare the house pricing for each sub-district (Kecamatan) level. Since the size of the listed properties may vary, we’ll use the price per square meter instead:

df_agg <- df %>% 
  mutate(
    harga_m2 = harga / m2
  ) %>% 
  group_by(kota, kecamatan) %>% 
  summarise(harga_m2 = median(harga_m2),
            total_listings= n()) %>% 
  ungroup()

head(df_agg)
#> # A tibble: 6 x 4
#>   kota          kecamatan         harga_m2 total_listings
#>   <chr>         <chr>                <dbl>          <int>
#> 1 Bekasi        Tarumajaya       12281250             463
#> 2 Depok         Beji             11746032.             84
#> 3 Depok         Cimanggis        11679365.            110
#> 4 Depok         Cinere           12761905.             88
#> 5 Jakarta Barat Cengkareng       13040000             455
#> 6 Jakarta Barat Grogolpetamburan 15032680.            223

To project our housing data into a map, we need to equip each observation in the data with geographical information representation. We can do this by joining the information of the city (kota) & sub-district (kecamatan) to NAME_2 & NAME_3 variables respectively in idn:

idn %>% 
  as.data.frame() %>% 
  head()
#>   GID_0    NAME_0   GID_1 NAME_1 NL_NAME_1     GID_2     NAME_2 NL_NAME_2
#> 1   IDN Indonesia IDN.1_1   Aceh      <NA> IDN.1.2_1 Aceh Barat      <NA>
#> 2   IDN Indonesia IDN.1_1   Aceh      <NA> IDN.1.2_1 Aceh Barat      <NA>
#> 3   IDN Indonesia IDN.1_1   Aceh      <NA> IDN.1.2_1 Aceh Barat      <NA>
#> 4   IDN Indonesia IDN.1_1   Aceh      <NA> IDN.1.2_1 Aceh Barat      <NA>
#> 5   IDN Indonesia IDN.1_1   Aceh      <NA> IDN.1.2_1 Aceh Barat      <NA>
#> 6   IDN Indonesia IDN.1_1   Aceh      <NA> IDN.1.2_1 Aceh Barat      <NA>
#>         GID_3           NAME_3 VARNAME_3 NL_NAME_3    TYPE_3    ENGTYPE_3
#> 1 IDN.1.2.1_1 Arongan Lambalek      <NA>      <NA> Kecamatan Sub-district
#> 2 IDN.1.2.2_1            Bubon      <NA>      <NA> Kecamatan Sub-district
#> 3 IDN.1.2.3_1   Johan Pahlawan      <NA>      <NA> Kecamatan Sub-district
#> 4 IDN.1.2.4_1        Kaway Xvi      <NA>      <NA> Kecamatan Sub-district
#> 5 IDN.1.2.5_1         Meureubo      <NA>      <NA> Kecamatan Sub-district
#> 6 IDN.1.2.6_1  Pantai Ceuremen      <NA>      <NA> Kecamatan Sub-district
#>      CC_3 HASC_3                       geometry
#> 1 1107062   <NA> MULTIPOLYGON (((95.97953 4....
#> 2 1107061   <NA> MULTIPOLYGON (((96.16601 4....
#> 3 1107050   <NA> MULTIPOLYGON (((96.13205 4....
#> 4 1107080   <NA> MULTIPOLYGON (((96.16397 4....
#> 5 1107081   <NA> MULTIPOLYGON (((96.25119 4....
#> 6 1107082   <NA> MULTIPOLYGON (((96.18817 4....

To join two dataframes in R, we can use the *_join() function from dplyr package:

df_agg <- df_agg %>% 
  left_join(idn, by = c("kota" = "NAME_2", "kecamatan" = "NAME_3"))

head(df_agg)
#> # A tibble: 6 x 19
#>   kota  kecamatan harga_m2 total_listings GID_0 NAME_0 GID_1 NAME_1 NL_NAME_1
#>   <chr> <chr>        <dbl>          <int> <chr> <chr>  <chr> <chr>  <chr>    
#> 1 Beka… Tarumaja…   1.23e7            463 IDN   Indon… IDN.… Jawa … <NA>     
#> 2 Depok Beji        1.17e7             84 IDN   Indon… IDN.… Jawa … <NA>     
#> 3 Depok Cimanggis   1.17e7            110 IDN   Indon… IDN.… Jawa … <NA>     
#> 4 Depok Cinere      1.28e7             88 IDN   Indon… IDN.… Jawa … <NA>     
#> 5 Jaka… Cengkare…   1.30e7            455 IDN   Indon… IDN.… Jakar… <NA>     
#> 6 Jaka… Grogolpe…   1.50e7            223 IDN   Indon… IDN.… Jakar… <NA>     
#> # … with 10 more variables: GID_2 <chr>, NL_NAME_2 <chr>, GID_3 <chr>,
#> #   VARNAME_3 <chr>, NL_NAME_3 <chr>, TYPE_3 <chr>, ENGTYPE_3 <chr>,
#> #   CC_3 <chr>, HASC_3 <chr>, geometry <MULTIPOLYGON [°]>

Now that our df_agg data have the geographic attributes attached, we can turn it into an sf object using st_as_sf() function:

df_agg <- st_as_sf(df_agg)

df_agg$geometry
#> Geometry set for 59 features 
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: 106.6325 ymin: -6.39886 xmax: 107.0317 ymax: -6.0409
#> geographic CRS: WGS 84
#> First 5 geometries:

3.2 Building Maps in R

3.2.1 Creating Maps with ggplot2

R is loaded with built-in tools and open source packages to help us turning both sp and sf objects into a neat map visualization. In the first session of this workshop, you have learned about a widely used and powerful plotting library for R, ggplot2. The great thing is that ggplot can plot sf objects directly by using geom_sf.

Recall about the ggplot2 layering system and see the code below:

ggplot(df_agg)+ # add base canvas
  geom_sf() # plot the geometry

You can also easily map the color of harga_m2 by adding the variable inside the aes() in the geom element:

ggplot(df_agg)+
  geom_sf(aes(fill = harga_m2)) 

Dive Deeper

Recall how you can use labs() in ggplot to add label informations, such as title, subtitle, etc. Copy down the previous code above, and provide the map with appropriate labellings on the chunk below!

## Your Code Here

With small adjustments and feature additions, we have built a nice geographic representation of house pricing in Jakarta. Notice that in the chunk below, I added a layer of theme_algo_map(), which was a customized map I created for this workshop. (You can also create your own theme too!)

The modularity of ggplot2 also allows us to save it as an external file, makes it easy for us to make a reproducible custom theming. You can find the code for theme_algo_map() under assets/theme_algo.R.

source('assets/theme_algo.R')

plot <- ggplot(df_agg)+
  geom_sf(aes(fill = harga_m2)) +
  labs(title = "Distribusi Harga Rumah di Jabodetabek",
       subtitle = "Hasil sample harga di marketplace properti",
       caption = "Sample menggunakan ±10,000 listing rumah di laman jual-beli rumah OLX\n September 2020",
       fill = "Harga/m2")+
  scale_fill_gradient(low = "lavenderblush", high = "red3",
                      labels = number_format(scale = 1/1e6, suffix = " Jt.", accuracy = 1))+
  theme_algo_map()

plot

Another benefit of creating maps with ggplot2 is how we can easily add a level of interactivity by simply use ggplotly() function from the plotly package.

Since we have stored our most recent plot creation as an object named plot, let’s try to wrap it with ggplotly(plot).

(P.S. If you haven’t install the plotly package, you can run install.packages("plotly") through your console)

ggplotly(plot)

You can also add another adjustments such as customizing the tooltip and hide the mode bar for cleaner appearance:

# create new column to store the tooltip information
df_agg <- df_agg %>% 
  mutate(text = glue("{kecamatan}, {kota}: <br> {number(harga_m2, scale = 1/1e6, suffix = 'Jt.', accuracy = .01)}"))

# re-run the function to create plot
plot <- ggplot(df_agg, aes(text = text))+ # add new aes mapping: text to define the tooltip text
  geom_sf(aes(fill = harga_m2)) +
  labs(title = "Distribusi Harga Rumah di Jabodetabek",
       subtitle = "Hasil sample harga di marketplace properti",
       caption = "Sample menggunakan ±10,000 listing rumah di laman jual-beli rumah OLX\n September 2020",
       fill = "Harga/m2")+
  scale_fill_gradient(low = "lavenderblush", high = "red3",
                      labels = number_format(scale = 1/1e6, suffix = " Jt.", accuracy = 1))+
  theme_algo_map()


# re-run the `ggplotly`
ggplotly(plot, tooltip = "text") %>% # define tooltip
  config(displayModeBar = F) # hide the modebar

Dive Deeper

Using the same method as above, try to create a map that represent the number of listed properties(total_listings) in each sub-district. Once you’re done, you can also try to customize your own map appearance!

## Your code here

3.2.2 Interactive Mapping with leaflet

Among all available mapping packages in R, leaflet has became the most widely used for building interactive maps in R. It provides a relatively low-level interface to the Leaflet JavaScript library. The full documentation of the package can be accessed here.

Taken from its official documentation5, to create a Leaflet map we can follow these basic steps:

  1. Create a map widget by calling leaflet().
  2. Add layers (i.e., features) to the map by using layer functions (e.g. addTiles, addMarkers, addPolygons) to modify the map widget.
  3. Repeat step 2 as desired.
  4. Print the map widget to display it.

Now, let’s follow the steps above to recreate the previous choropleth as a Leaflet map.

# leaflet
leaflet(df_agg) %>% # create map widget
  addTiles() %>% # add basemap
  addPolygons() # add polygons from `sf` data %

By default, addTiles() used tiles from OpenStreetMap. We can also use third-party basemaps by using addProviderTiles() instead. The lists of complete set basemaps provided by leaflet-providers can be found here.

# list of available provider tiles
names(providers)
#>   [1] "OpenStreetMap"                      
#>   [2] "OpenStreetMap.Mapnik"               
#>   [3] "OpenStreetMap.DE"                   
#>   [4] "OpenStreetMap.CH"                   
#>   [5] "OpenStreetMap.France"               
#>   [6] "OpenStreetMap.HOT"                  
#>   [7] "OpenStreetMap.BZH"                  
#>   [8] "OpenSeaMap"                         
#>   [9] "OpenPtMap"                          
#>  [10] "OpenTopoMap"                        
#>  [11] "OpenRailwayMap"                     
#>  [12] "OpenFireMap"                        
#>  [13] "SafeCast"                           
#>  [14] "Thunderforest"                      
#>  [15] "Thunderforest.OpenCycleMap"         
#>  [16] "Thunderforest.Transport"            
#>  [17] "Thunderforest.TransportDark"        
#>  [18] "Thunderforest.SpinalMap"            
#>  [19] "Thunderforest.Landscape"            
#>  [20] "Thunderforest.Outdoors"             
#>  [21] "Thunderforest.Pioneer"              
#>  [22] "Thunderforest.MobileAtlas"          
#>  [23] "Thunderforest.Neighbourhood"        
#>  [24] "OpenMapSurfer"                      
#>  [25] "OpenMapSurfer.Roads"                
#>  [26] "OpenMapSurfer.Hybrid"               
#>  [27] "OpenMapSurfer.AdminBounds"          
#>  [28] "OpenMapSurfer.ContourLines"         
#>  [29] "OpenMapSurfer.Hillshade"            
#>  [30] "OpenMapSurfer.ElementsAtRisk"       
#>  [31] "Hydda"                              
#>  [32] "Hydda.Full"                         
#>  [33] "Hydda.Base"                         
#>  [34] "Hydda.RoadsAndLabels"               
#>  [35] "MapBox"                             
#>  [36] "Stamen"                             
#>  [37] "Stamen.Toner"                       
#>  [38] "Stamen.TonerBackground"             
#>  [39] "Stamen.TonerHybrid"                 
#>  [40] "Stamen.TonerLines"                  
#>  [41] "Stamen.TonerLabels"                 
#>  [42] "Stamen.TonerLite"                   
#>  [43] "Stamen.Watercolor"                  
#>  [44] "Stamen.Terrain"                     
#>  [45] "Stamen.TerrainBackground"           
#>  [46] "Stamen.TerrainLabels"               
#>  [47] "Stamen.TopOSMRelief"                
#>  [48] "Stamen.TopOSMFeatures"              
#>  [49] "TomTom"                             
#>  [50] "TomTom.Basic"                       
#>  [51] "TomTom.Hybrid"                      
#>  [52] "TomTom.Labels"                      
#>  [53] "Esri"                               
#>  [54] "Esri.WorldStreetMap"                
#>  [55] "Esri.DeLorme"                       
#>  [56] "Esri.WorldTopoMap"                  
#>  [57] "Esri.WorldImagery"                  
#>  [58] "Esri.WorldTerrain"                  
#>  [59] "Esri.WorldShadedRelief"             
#>  [60] "Esri.WorldPhysical"                 
#>  [61] "Esri.OceanBasemap"                  
#>  [62] "Esri.NatGeoWorldMap"                
#>  [63] "Esri.WorldGrayCanvas"               
#>  [64] "OpenWeatherMap"                     
#>  [65] "OpenWeatherMap.Clouds"              
#>  [66] "OpenWeatherMap.CloudsClassic"       
#>  [67] "OpenWeatherMap.Precipitation"       
#>  [68] "OpenWeatherMap.PrecipitationClassic"
#>  [69] "OpenWeatherMap.Rain"                
#>  [70] "OpenWeatherMap.RainClassic"         
#>  [71] "OpenWeatherMap.Pressure"            
#>  [72] "OpenWeatherMap.PressureContour"     
#>  [73] "OpenWeatherMap.Wind"                
#>  [74] "OpenWeatherMap.Temperature"         
#>  [75] "OpenWeatherMap.Snow"                
#>  [76] "HERE"                               
#>  [77] "HERE.normalDay"                     
#>  [78] "HERE.normalDayCustom"               
#>  [79] "HERE.normalDayGrey"                 
#>  [80] "HERE.normalDayMobile"               
#>  [81] "HERE.normalDayGreyMobile"           
#>  [82] "HERE.normalDayTransit"              
#>  [83] "HERE.normalDayTransitMobile"        
#>  [84] "HERE.normalDayTraffic"              
#>  [85] "HERE.normalNight"                   
#>  [86] "HERE.normalNightMobile"             
#>  [87] "HERE.normalNightGrey"               
#>  [88] "HERE.normalNightGreyMobile"         
#>  [89] "HERE.normalNightTransit"            
#>  [90] "HERE.normalNightTransitMobile"      
#>  [91] "HERE.reducedDay"                    
#>  [92] "HERE.reducedNight"                  
#>  [93] "HERE.basicMap"                      
#>  [94] "HERE.mapLabels"                     
#>  [95] "HERE.trafficFlow"                   
#>  [96] "HERE.carnavDayGrey"                 
#>  [97] "HERE.hybridDay"                     
#>  [98] "HERE.hybridDayMobile"               
#>  [99] "HERE.hybridDayTransit"              
#> [100] "HERE.hybridDayGrey"                 
#> [101] "HERE.hybridDayTraffic"              
#> [102] "HERE.pedestrianDay"                 
#> [103] "HERE.pedestrianNight"               
#> [104] "HERE.satelliteDay"                  
#> [105] "HERE.terrainDay"                    
#> [106] "HERE.terrainDayMobile"              
#> [107] "FreeMapSK"                          
#> [108] "MtbMap"                             
#> [109] "CartoDB"                            
#> [110] "CartoDB.Positron"                   
#> [111] "CartoDB.PositronNoLabels"           
#> [112] "CartoDB.PositronOnlyLabels"         
#> [113] "CartoDB.DarkMatter"                 
#> [114] "CartoDB.DarkMatterNoLabels"         
#> [115] "CartoDB.DarkMatterOnlyLabels"       
#> [116] "CartoDB.Voyager"                    
#> [117] "CartoDB.VoyagerNoLabels"            
#> [118] "CartoDB.VoyagerOnlyLabels"          
#> [119] "CartoDB.VoyagerLabelsUnder"         
#> [120] "HikeBike"                           
#> [121] "HikeBike.HikeBike"                  
#> [122] "HikeBike.HillShading"               
#> [123] "BasemapAT"                          
#> [124] "BasemapAT.basemap"                  
#> [125] "BasemapAT.grau"                     
#> [126] "BasemapAT.overlay"                  
#> [127] "BasemapAT.highdpi"                  
#> [128] "BasemapAT.orthofoto"                
#> [129] "nlmaps"                             
#> [130] "nlmaps.standaard"                   
#> [131] "nlmaps.pastel"                      
#> [132] "nlmaps.grijs"                       
#> [133] "nlmaps.luchtfoto"                   
#> [134] "NASAGIBS"                           
#> [135] "NASAGIBS.ModisTerraTrueColorCR"     
#> [136] "NASAGIBS.ModisTerraBands367CR"      
#> [137] "NASAGIBS.ViirsEarthAtNight2012"     
#> [138] "NASAGIBS.ModisTerraLSTDay"          
#> [139] "NASAGIBS.ModisTerraSnowCover"       
#> [140] "NASAGIBS.ModisTerraAOD"             
#> [141] "NASAGIBS.ModisTerraChlorophyll"     
#> [142] "NLS"                                
#> [143] "JusticeMap"                         
#> [144] "JusticeMap.income"                  
#> [145] "JusticeMap.americanIndian"          
#> [146] "JusticeMap.asian"                   
#> [147] "JusticeMap.black"                   
#> [148] "JusticeMap.hispanic"                
#> [149] "JusticeMap.multi"                   
#> [150] "JusticeMap.nonWhite"                
#> [151] "JusticeMap.white"                   
#> [152] "JusticeMap.plurality"               
#> [153] "Wikimedia"                          
#> [154] "GeoportailFrance"                   
#> [155] "GeoportailFrance.parcels"           
#> [156] "GeoportailFrance.ignMaps"           
#> [157] "GeoportailFrance.maps"              
#> [158] "GeoportailFrance.orthos"            
#> [159] "OneMapSG"                           
#> [160] "OneMapSG.Default"                   
#> [161] "OneMapSG.Night"                     
#> [162] "OneMapSG.Original"                  
#> [163] "OneMapSG.Grey"                      
#> [164] "OneMapSG.LandLot"

To access the supported tile providers, we can also useproviders$ and choose from the options:

leaflet(df_agg) %>% 
  addProviderTiles(providers$Esri.NatGeoWorldMap) %>%  
  addPolygons()

Let’s now color the polygons based on the house pricing. First, we need to create a color palette to represent the data. In the chunk below, we create an object called pal which stores the colors generated from df_agg$harga_m2 values. The value passed to fill in palette is provided by ColorBrewer2 palettes.

Notice that on the next line, inside the addPolygons() function call, we also add some parameters:

  • fillColor: the values and palette to be mapped
  • fillOpacity: the fillColor opacity
  • weight: thickness of the mapped polygons
  • color: color of the mapped polygons
pal <- colorNumeric(palette = "Reds", domain = df_agg$harga_m2)

leaflet(df_agg) %>% 
  addProviderTiles(providers$Esri.NatGeoWorldMap) %>%  
  addPolygons(
    fillColor = ~pal(harga_m2),
    fillOpacity = .8,
    weight = 2,
    color = "white"
  )

To add more interaction to our map, we can also make the polygons highlight as the mouse passes over them. We can do it by defining the highlight argument inside the addPolygons() function:

leaflet(df_agg) %>% 
  addProviderTiles(providers$Esri.NatGeoWorldMap) %>%  
  addPolygons(
    fillColor = ~pal(harga_m2),
    fillOpacity = .8,
    weight = 2,
    color = "white",
    highlight = highlightOptions(
      weight = 5,
      color = "black",
      bringToFront = TRUE,
      opacity = 0.8
    )
  )

Now let’s also add labels information as we hover over each sub-district mapped. To add the label, we need to create a HTML object using htmltools::HTML which stores the value of information we desired to be shown:

labels <- glue::glue("
  <b>{df_agg$kecamatan}, {df_agg$kota}</b>:<br> {round(df_agg$harga_m2/1e6, 2)} jt/m2"
) %>% lapply(htmltools::HTML)

Then, to map it, we can add label argument inside the addPolygons() function.

leaflet(df_agg) %>% 
  addProviderTiles(providers$Esri.NatGeoWorldMap) %>%  
  addPolygons(
    label = labels,
    fillColor = ~pal(harga_m2),
    fillOpacity = .8,
    weight = 2,
    color = "white",
    highlight = highlightOptions(
      weight = 5,
      color = "black",
      bringToFront = TRUE,
      opacity = 0.8
    )
  )

As the final step, we might also need to add a legend to give information about the colors and intervals. To add the legend, we only a layer of addLegend() function:

leaflet(df_agg) %>% 
  addProviderTiles(providers$Esri.NatGeoWorldMap) %>%  
  addPolygons(
    label = labels,
    fillColor = ~pal(harga_m2),
    fillOpacity = .8,
    weight = 2,
    color = "white",
    highlight = highlightOptions(
      weight = 5,
      color = "black",
      bringToFront = TRUE,
      opacity = 0.8
    )
  ) %>% 
  addLegend(
    pal = pal,
    values = ~harga_m2,
    opacity = 1,
    title = "Harga/m2",
    position = "bottomright"
  )

Dive Deeper

  1. Suppose we want to create a border to separate areas under the Jakarta Area. Recall how you can perform aggregation over the vector attributes and fill down the code skeleton (______) below to get the geometry for our Jakarta border!
# Copy down this code to the chunk below:

border  <- df_agg %>% 
  filter( ______== ______) %>% 
  group_by(______) %>% 
  ______() 
## Your code here
  1. Now that you have stored the border information, add another layer on our most recent mapping code to map the border of Jakarta area. Again, you can use the skeleton below to prepare the layer you need:
addPolylines(
    data = ______,
    color = "darkred",
    opacity = .8,
    weight = 2.5
  )
## Your code here
  1. Once your done, feel free to modify the code elements to create your own map customization. For your reference, you can go to leaflet for R’s documentation here!
## Your code her

Complete Code:

pal <- colorNumeric(palette = "Reds", domain = df_agg$harga_m2)

labels <- glue::glue("
  <b>{df_agg$kecamatan}, {df_agg$kota}</b>:<br> {round(df_agg$harga_m2/1e6, 2)} jt/m2"
) %>% lapply(htmltools::HTML)

border  <- df_agg %>% 
  filter(NAME_1 == "Jakarta Raya") %>% 
  group_by(NAME_1) %>% 
  summarise() 

leaflet(df_agg) %>% 
  addProviderTiles(providers$Esri.NatGeoWorldMap) %>%  # using `addProviderTiles()` instead of `addTiles()`
  addPolygons(
    label = labels,
    labelOptions = labelOptions(
      style = list(
        "font-size"="13px",
        "background-color"="black",
        "color"="white"
      )
    ),
    weight = 2,
    color = "white",
    fillOpacity = .8,
    fillColor = ~pal(harga_m2),
     highlight = highlightOptions(
    weight = 5,
    color = "black",
    bringToFront = TRUE,
    sendToBack = TRUE,
    opacity = 0.8)
  ) %>% 
  addPolylines(
    data = border,
    color = "darkred",
    opacity = .8,
    weight = 2.5
  ) %>% 
  addLegend(
    pal = pal,
    values = ~harga_m2,
    opacity = 1,
    title = "Harga/m2",
    position = "bottomright"
  ) %>%
  fitBounds(106.686211, -6.370783, 106.972824, -6.089036)

4 More Thematic Maps

In general, spatial data can be represented either as reference maps or thematic maps. While reference maps emphasizes the location of objects in the world, thematic maps shows the spatial variability of a specific distribution. The house pricing distribution map we created earlier is among the most frequently used thematic map types in geospatial data, which is the Choropleth map.

As an example, let’s take a look at this dataset of Perumahan (housing estate) locations in Jakarta area:

perum <- read.csv('data/perumahan.csv')
head(perum)
#>               perumahan            kota  latitude longitude
#> 1 Bintaro Jaya Sektor 2 Jakarta Selatan -6.278108  106.7522
#> 2        Buaran Regency   Jakarta Timur -6.232117  106.9251
#> 3    Casa Permata Hijau Jakarta Selatan -6.221427  106.7832
#> 4     Cempaka Residence Jakarta Selatan -6.291828  106.7779
#> 5       Green Lake City   Jakarta Barat -6.177477  106.7113
#> 6          Green Lontar Jakarta Selatan -6.364691  106.7983

Let’s try to create a dot density map using leaflet’s addCircles() function:

leaflet(perum) %>% 
  addTiles() %>% 
  addCircles(label = ~perumahan)

Another common way of representing geographical density map is by using a heatmap. To create a heatmap in leaflet, we can use leaflet.extras package:

library(leaflet.extras)

leaflet(perum) %>% 
  addTiles() %>%
  addCircles(
    label = ~perumahan,
    color = "red"
  ) %>% 
  addHeatmap(
    radius = 10
  ) 

5 [Optional:] Other Mapping Packages

Geospatial mapping is an active area in R community, which makes R loaded with many other packages that supported spatial visualization. Packages like cartography, mapview, ggspatial, rasterVis and many other libraries that you can explore. Other than ggplot2 and leaflet that we used in this workshop, tmap is also one of the most popular package used to create both static and interactive maps in R.

tmap also works similarly with ggplot2, since it is also based on the idea of Grammar of Graphics6:

library(tmap)

tm_shape(df_agg)+
  tm_polygons("harga_m2")

Just like how you can use facets in ggplot2, you can also create it with tm_facets(), for example, to split our map based on each city:

tm_shape(df_agg) +
    tm_polygons("harga_m2") +
    tm_facets(by = "kota")

tmap also has a nice feature that allows us to add interactivity by switching from “plot” to “view” mode like so:

tmap_mode("view")

tm_shape(df_agg)+
  tm_polygons("harga_m2") +
  tm_bubbles("total_listings")

6 Publishing your Map

7 References